A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics
نویسندگان
چکیده
A statistical coarticulatory model is presented for spontaneous speech recognition, where knowledge of the dynamic, target-directed behavior in the vocal tract resonance responsible for the production of highly coarticulated speech is incorporated into the recognizer design, training, and in likelihood computation. The principal advantage of the new speech model over the conventional HMM is the use of a compact, internal structure that parsimoniously represents long-span context dependence in the observable domain of speech acoustics without using additional, contextdependent model parameters. The new model is formulated mathematically as a constrained, nonstationary, and nonlinear dynamic system, for which a version of the generalized EM algorithm is developed and implemented for automatically learning the compact set of model parameters. Experiments for speech recognition using spontaneous speech data from SWITCHBOARD corpus are reported.
منابع مشابه
Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics.
A statistical coarticulatory model is presented for spontaneous speech recognition, where knowledge of the dynamic, target-directed behavior in the vocal tract resonance is incorporated into the model design, training, and in likelihood computation. The principal advantage of the new model over the conventional HMM is the use of a compact, internal structure that parsimoniously represents long-...
متن کاملCoarticulation modeling by embedding a target-directed hidden trajectory model into HMM - model and training
We propose and evaluate a new acoustic model that combines HMM and a special type of the hidden dynamic model (HDM) – a target-directed hidden trajectory model – into a single integrated model named HTHMM. The new model provides a computational model of coarticulation by representing the internal dynamics of human speech based on the hidden trajectory of the vocal-tract resonances. This paper f...
متن کاملA Generative Modeling Framework for Structured Hidden Speech Dynamics
We outline a structured speech model, as a special and perhaps extreme form of probabilistic generative modeling. The model is equipped with long-contextual-span capabilities that are missing in the HMM approach. Compact (and physically meaningful) parameterization of the model is made possible by the continuity constraint in the hidden vocal tract resonance (VTR) domain. The target-directed VT...
متن کاملStatistical multi-stream modeling of real-time MRI articulatory speech data
This paper investigates different statistical modeling frameworks for articulatory speech data obtained using real-time (RT) magnetic resonance imaging (MRI). To quantitatively capture the spatio-temporal shaping process of the human vocal tract during speech production a multi-dimensional stream of direct image features is extracted automatically from the MRI recordings. The features are close...
متن کاملFaster 3d vocal tract real-time MRI using constrained reconstruction
Real-time magnetic resonance imaging (rtMRI) is a valuable emerging tool for studying the dynamics of vocal production. Conventional 2D rtMRI typically images the midsagittal plane of the vocal tract, acquiring data from all the important articulators. Dynamic 3D MRI would be a major advance, as it would provide 3D visualization of the vocal tract shaping dynamics, especially for the modeling o...
متن کامل